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Abstract. In this report we consider the parameterization of low-dimensional manifolds that arc 
specified (approximately) by a set of points very close to the manifold in the original high-dimensional 
space. Our objective is to obtain a parameterization that is (1-1) and non singular (in the sense that the 
j^jfjj Jacobian of the map between the manifold and the parameter space is bounded and non singular). 
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1. Introduction. When it is known that the high-dimensional solutions of a compu- 
tational problem lie on a lower-dimensional manifold we may wish to operate only on that 
manifold. By repeatedly solving the problem numerically, we can compute points that are 
approximately on the manifold and we will assume that we are able to compute a set of 
| such points over the manifold so that they are "reasonably distributed" (i.e., the average 

density of the points does not vary too greatly from region to region. We also assume that 
we can approximate any point on the manifold sufficiently accurately by interpolation from 
these pre-computed points and we would like to find a convenient parameterization of the 
manifold that can be used for the interpolation. 

For example, if the solutions of a differential equation in the high-dimensional space 
rapidly approach a low-dimensional manifold it may be desirable to integrate a reduced 
system on the manifold to avoid stiffness. In this case, it might be even more convenient 
to integrate in the parameter space. Note that in this example it is important that the 
, parameterization be non singular: otherwise the differential equations in the parameter 

space would have a singularity. 

We suppose that we are given a set of N points Xj, i = 1 , • • • , N in s-dimensional 
Euclidean space, S, with coordinates Xj = {x q i\, q = 1, ■ • • , s that are on (or close to, due 
to noise or computational error) a low-dimensional manifold, M. Our ultimate objective is 
to find, computationally, a parameterization of any manifold that can be simply param- 
eterized. For now we rule out manifolds such as tori that can only be parameterized by 
using periodicity in some of the parameters. A desiderata is to be able to parameterize any 
manifold that can be formed by "twisting" (without any stretching) a linear manifold in the 
high dimensional space (such as the well-known "Swiss roll" example in Figure [OT b)) so 
that the parameterization is a good approximation to a Cartesian coordinate system for the 
original linear manifold. (In that figure the X-axis is foreshortened and has a range of about 
2.5 times that of the other two axes so as to show the structure. Consequently it appears 
to be rolled from a fairly narrow rectangle but it is actually rolled from the square shown 
in Figure UTTT a). The randomness of these points was constrained to get a "reasonable" 
distribution - the 1 by 1 square was divided into 40 by 40 equal smaller squares, a point was 
placed in the middle of each and then subject to a random uniform perturbation in each 
coordinate in the range [-0.4, 0.4]/40, thus ensuring that no points were closer than 0.2/40.) 

The learning and statistical communities have long been interested in manifold learning 
and representation so there is a large body of literature. In those cases the typical task 
is to start with the data set {X^} and produce a "representation" in a lower-dimensional 
space such that the differences in the distances of pairs of points in the two spaces are 
suitably small. This was achieved by minimizing a stress function in Sammon's early paper[5] 
and numerous modifications have been proposed, such as [5J |B]. If the low-dimensional 
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manifold is linear, the Principal Component Analysis (PCA) is the standard way to identify 
and represent it - the principal components provide an orthogonal basis for the smallest 
linear space containing the manifold. Many methods for non-linear manifolds start with 
some modification of PCA. For example, [8] effectively uses PCA locally and then stitches 
the tangent spaces together. Local Linear Embedding (LLE) [2] looks for a local linear 
approximation which is related to PCA. 

Another class of methods is based on the structure of the graph generated by focussing 
attention primarily on nearby neighbors. Diffusion Maps pQ are one type of this method. 
We have looked at the use of Diffusion Maps for the parameterization of slow manifolds 
[4]. Diffusion Maps use a parameter, e, that effectively specifies the distance of what is 
considered a nearby neighbor. When a small e is used, the parameterization is nearly 
singular at the boundary. A much larger e sometimes overcomes this difficulty, but can lead 
to other difficulties. 

In this report we show that that the limit of a particular case of diffusion maps for large 
e is PCA if we use the left eigenvectors rather than the right eigenvectors normally used in 
diffusion maps. However, because, in diffusion maps, we work with distances in the graph 
structure rather than with the coordinates {x^} that are the input to PCA, we can consider 
modifications to the distances of pairs of points that are not nearby neighbors to improve 
the parameterization. 

In Section [2] we discuss the problems of diffusion maps for the data set in Figure fTTlT b) 
- for small e the map has a nearly singular Jacobians, but breaks down for large e because 
points distant in the manifold are relatively close in the high-dimensional space. Then, in 
Section [31 we consider the large e limit for diffusion maps on linear manifolds and prove that 
the left eigenvectors provide the PCA decomposition and show that this can be obtained 
by working directly with the distance matrix. In Section [4] we consider ways to modify the 
distance matrix to improve the parameterization. 

2. Diffusion Map Issues. Define the (symmetric) distance matrix H to have the 
entries 



Hy — d(Xi, Xj) 
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where d(., .) is the Euclidean distance between the points, that is 
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The diffusion map method [T] now forms the matrix W = {W,,} = {w(Hij)} where w is 
a suitable non-negative function such that w(0) = 1 and w(h) is monotonic non-increasing 
for ft, > with w(oo) = 0. We will use 



The diffusion map method continues by dividing each row of W by its corresponding row 
sum to get a Markov matrix K whose largest eigenvalue is 1 with a corresponding eigenvec- 
tor e = [1, 1, • • • , 1] T . Some of its next few eigenvectors serve to identify the low-dimensional 
manifold and the i-th components of these eigenvectors provide a parameterization for the 
points, Xj, on the manifold. It is well known that for small e the parameterization is poor 
near the boundaries - see, for example, [3] - but that a large e can give a good parameteri- 
zation under some circumstances. 

In Figure 12.11 we show the results of applying a diffusion map to the data in Figure 11.11 
(b). Here, e was chosen as y/2f3 where /? was the smallest value such that if all edges in 
the graph corresponding to the distance matrix H longer than j3 were removed, the graph 
would still be connected. (For this particular data set, (3 is 0.0322.) Figure |2~T1 (a) and (b) 
plot the components of the second and third eigenvectors, £2 and £3, against the X and 
ye coordinates of the points in Figure ll.lf b) . (X and ye are the coordinates of the points 
in Figure ll.lf a) . After the square is "rolled" ye is the arc length around the spiral from 
its inner stating point while X is the coordinate along the length of the cylindrical spiral.) 
We see that £2 and £3 come very close to providing direct parameterizations of X and ye, 
respectively, and this is shown more clearly in Figure 12.11 (c) and (d) (which are projections 
of the previous two figures onto axial panes). However, we see that the parameterization is 
very poor near the boundary where 8^2 / dX and 8^3 / dye are nearly zero (because the curve 
is close to a cosine half cycle). 

In some cases, this can be overcome by using a large e. In Figure 12.21 (a) and (b) we 
show the equivalent of Figure l2~Tl (c) and (d) for a diffusion map of the same data with e = 1 
- the distance along the cylinder. Note that £2 provides a very good parameterization of the 
X direction - it is almost linear, but that £3 is no longer a (1-1) map. This occurs because 
W,j for points on adjacent turns of the spiral is about 0.985 so the points are seen as close. 
(The spiral started with radius 0.03 and increased radially by 0.12 on each full turn.) 

We would like to get the benefits of a large e - the linearity in Figure 12.2( a) without 
the drawbacks illustrated in Figure l2~27 b). For that reason, we study the large e limit of 
diffusion maps in the next section. 

3. Diffusion Maps in the Large e Limit. We are really interested in non-linear 
manifolds, but for expository purposes we start by assuming that the manifold is linear. Set 
the (arbitrary) origin to be at the point mean. This implies that 
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(3.1) 



for all q. Since the manifold is linear, it is now a linear subspace, D. 

3 




Fig. 2.1. Diffusion map applied to data in Figure Tl.lY b) with e = 0.0455. (a) Second eigenvector £2- 
(b) Third eigenvector £3. (c) £2 versus X. (d) £3 versus yg. 
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Fig. 2.2. Diffusion map applied to data in Fiaure Vl . llf b) with e = 1. (a) £2 versus X. (b) £3 versus yg. 



Define x g to be the vector whose z-th component is x q ^ and e — [1, 1, ■ ■ • , 1] T so eq 
(|3.ip can be written as 

e T x g = (3.2) 

for all q. (Bold face lower-case roman letters will always refer to TV-dimensional column 
vectors while non-bold face roman letters will refer to s-dimensional column vectors.) 
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Suppose e is large enough that (max(H,j)/e) can be ignored. In that case, 
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The row sums of this matrix are 

AT „ AT 



i=i i=i 



so 



11 i j! 

K^-(1--[H^-£h^]) (3.3) 



In other words, 



fc=i 



K«1(E-F) 

where E is the matrix of all ones and 

1 1 N 

F i:j = -^[11% - ^5I H ^]) 

fc=i 

s JV 

q=l fc=l 
s N 

= - 2x i,i x i,j - J2 x lk/ N )- ( 3 - 4 ) 

9=1 k=l 

Writing c q for X^fcli 2:2 fc an d defining vector v g to have i-th entries x 2 q i , eq p.4j) can 
be written as 

F = D e(v ^ Cge)T -^- ( 3 - 5 ) 

9=1 

The points Xj lie in the linear subspace D. Chose a coordinate system for S such that 
the first d coordinates span D. Hence, for q > d x g = v g = and c q = so we can rewrite 
eq (|3.5[) as 

[ ^ g q ' -2x g x^]. (3.6) 

9=1 

Note that if a left or right eigenvector, v of K given by eq (|3.3[) satisfies e T v = then 
it is also an eigenvector of Fand vice versa. 
Observation 

If the points {Xi} are regularly placed in the linear subspace - that is is they lie on 
the points of a regular rectangular grid or its extension to higher dimensions - and if the 
coordinate system is aligned with this grid, then x g , q = 1, • • • , d are eigenvectors of F and 
hence K. 



This follows by construction. In this case the vector x 9 takes the form 

x g = ei ® e2 (8 • • • ® e 9 _i ® z 9 ® e 9+ i ® • • • (8 e<j (3.7) 

where is a column vector of ones of length equal to the number of grid points in the z-th 
dimension and z q is the set of coordinate points in the g-th dimension. Note that e q z q = 
for all q so that the x g are mutually orthogonal and all orthogonal to e. Hence 

Fx^E[ eK ^ e)T -^xJ]x, 

P =i 

d T 
GV 

= £-ftT x,-2(x^x,)x g . (3.8) 

p=i 

If p 7^ g then vjx g = directly from cq p. 71) so it only remains to show that for this 
data, 

N 

V q X 1 = X li = 

i=l 

which is true by virtue of the regular spacing and e T x f/ = 0. 

If the points are "reasonably" distributed, then this property is approximately true. In 
Figure [3~T1 we show the points plotted via their entries, £2,-1 and £3^ of the eigenvectors of K 
for the planar data in Figure [TTTt a). Because the eigenvalue computation finds the dominant 
axis as second leading eigenvector (after the eigenvector e) and that tends towards one of 
the diagonals when the data is random, the square has been rotated. To estimate how far 
this is from being an exact parameterization of the data, we computed the best (in the least 
squares sense) affine transformation from the (£2, £3) - pl anc to the (X, yg)-planc of Figure 
ll.lf a) and computed the maximum L2 error in the (X, j/g)-plane over all points. The result 
was 1.7899 x 10 -14 - only barely larger than round-off errors. 
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Fig. 3.1. Points plotted by right eigenvectors o/F. 



However, it is more important to note that the left eigenvectors of F always have this 
property, regardless of the distribution of points, so we propose to use the left eigcnsystem 
of F but to avoid having to repeat "left eigenvector" we will instead work with G = F T 
whose eigenvectors u are the left eigenvectors of F. 
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We are dealing with vectors in two different spaces: the s-dimcnsional original Euclidean 
space containing the points whose coordinates are Xj = \x\ i,x% t i, ■ ■ ■ ,x Sj i] T and the N- 
dimcnsional space, P, containing the vectors x g = [x q ,i, x q .2, ■•• , Xq.^Y '■ If wc pick any basis 
for the Euclidean space (which has to be orthogonal so that eq (|3.4[) is based on the distance) 
it determines the entries in the iV-dimensional vectors x g . For each orthogonal set of vectors 
in the Euclidean space (which we will call geometric vectors) there is a corresponding set 
of vectors of the point coordinates (which we will call point vectors). In particular, if wc 
pick an (orthogonal) basis set for S such that the first d basis vectors span D, then the x q ^ 
satisfy x q: i = 0,q > d. Hence, the remaining s — d basis vectors do not affect the d vectors 
x <?j 1 < q < d. In this sense, to every orthogonal basis for D there is a corresponding set of 
d linearly-independent point vectors in P. 

To avoid verbal complexity, we will refer to "non-zero eigenvectors" to mean "eigenvec- 
tors corresponding to non-zero eigenvalues" and its obvious extensions. 

We find that G has the interesting properties summarized in 
Theorem 

// the components of G are given by 

G=B K T )eT - 2x ^ ^ 

9=1 

then 

1. G has exactly d non-zero eigenvalues. 

2. The zero eigenvalue has algebraic multiplicity N — d and geometric multiplicity at 
least N — d — 1 although there are special cases where the geometric multiplicity is 
also N — d. 

3. There exists a geometrically orthogonal basis for D such that the d corresponding 
non-null point vectors x g are the non-zero eigenvectors of G. 

4. These eigenvectors are orthogonal. 

5. The corresponding eigenvalues are — 2x^x 9 so that all non-zero eigenvalues are 
negative. 

Proof 

We restrict ourselves to coordinate systems in S such that the first d cordinates span 
M. From eq (|3.9j) we see that if b is any vector orthogonal to e and x g , q = 1, • • ■ , d then 
Gb = so b is a zero eigenvector. Hence any s — d — 1 (linearly independent) vectors that 
span the space orthogonal to e and x g , q = 1, ■ ■ • , d are independent zero eigenvectors and 
are orthogonal to e. Note that e T is a zero left eigenvector of K. Since it is orthogonal to 
the s — d—1 zero right eigenvectors just enumerated, there must be one more zero eigenvalue. 
(Usually its right eigenvector is a generalized eigenvector but there are special cases when 
this is not so, for example, when N — 1 = d = s.) 

We complete the proof by showing that there are d independent non-zero eigenvectors. 

Let X be the matrix whose columns are x q , q = 1, • • • , d, and let y = Xq be a linear 
combination of those vectors where a is a d-dimensional vector. Hence from eq ()3.9j) and 
e T X — we have 

Gy = -2XX T Xa (3.10) 

Define 



Y = XX T . 
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(3.11) 



It is a d by d symmetric matrix so has real eigenvalues and mutually orthogonal eigenvectors. 
Because the points are in the d-dimensional subspace D and no lower dimensional subspace, 
Y is positive definite so all of its eigenvalues are strictly positive. Let a be one of the 
eigenvectors with corresponding eigenvalue fj,. Hence 

Gy = -2/iXa = —2fiy 

so y = Xa is an eigenvector of G with eigenvalue — 2fj,. This provides a complete set of 
d non-zero eigenvectors. If Q is an orthonormal matrix whose columns are the (scaled) 
eigenvectors, {a}, of Y then Q can be applied to the initially selected Cartesian coordinate 
system for D to get a new Cartesian coordinate system with property 3. 

Property 4 follows because the action of G defined by eq (|3.9|) on the subspace spanned 
by {x 9 } is exactly the same as the action of 

d 

G = -2^Tx 9 x^ (3.12) 

9=1 

G is a symmetric matrix so its eigenvectors are orthogonal. Property 5 follows from eq (|3.9|) 
when we choose a basis for D such that the eigenvectors are {x g } (we have demonstrated 
how to do above). Then Y is diagonal with entries x^x g which are its d eigenvalues n q . 

As a footnote to this proof we observe that the number of non-zero eigenvalues follows 
quickly from a paper of nearly 75 years ago by Young and Householder [7]. They show that 
the matrix 

H e 

e T 

has rank d + 2 (see eq. (2) in their paper). If we subtract the average of the first N rows 
from the last and then add the new last row to each of the first N rows (not changing the 
matrix rank) we get the matrix 

" E + G 

z T -1 

where z is some iV-dimcnsional vector whose value does not affect the rank. Hence, E + G 
has rank d+1. Since e T is a left eigenvector of G and E with eigenvalues and 1 respectively 
and E has rank 1, the rank of G is d. 

4. Modifying Distances to get a Good Parameterization. We see from eq (|3.9[) 
that the non-zero eigenvectors of G are the eigenvectors of XX T and these are precisely the 
principal components from PC A so the method proposed in Section [3] provides exactly the 
PCA representation of the data set, which raises the question "What is the advantage of 
this method over PCA?" PCA starts with the coordinates of the data whereas this method 
starts with the distances of each point pair (and incidentally provides the coordinates in an 
orthogonal coordinate system from the calculation). The problems with the diffusion map 
parameterization (using large e as discussed in Section [5]) arose when distances that were 
large in the manifold were small in the original space. Because the algorithm only uses dis- 
tances we can consider modifying the distances of more distant objects to be approximately 
compatible with those in the manifold. 

A procedure for doing this is as follows: 
1. Remove all edges from the graph representation of the distance matrix H longer 
than some value. 
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2. Compute the new shortest path to all edges to get a modified distance matrix 

3. Use this modified matrix to generate a parameterization using the technique of 
Section [3] 

In Figure l4~TT a) we show the points plotted in the (£i,£2)-plane using the modified 
distance matrix approach for the Swiss roll data in Figure fOT b). Edges were removed from 
the graph that were longer than 2/3 where j3 was the minimum value that did not lead to a 
disconnected graph when all larger edges were cut ((3 is 0.0322). As happened in Figure 1331 
the square is twisted. As before, we found the best affine transformation of these points to 
match the data in Figure [TTTT a). The new coordinate systems is called <f>i and <f>2- In Figure 
4.11 (b) and (c) we show the plots of X versus 0i and yg versus (f> 2 - As can be seen, the 
relationship is almost linear. The largest distance between the rotated data and the data 
on the original square before it was "Swiss rolled" was 1.7899 x 10~ 14 , again, barely larger 
than round off. 
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Fig. 4.1. Modified distance methods applied to Swiss roll data, (a) Points plotted by right eigenvectors 
of G. (b) and (c) Plot of rotated coordinates of (a) against coordinates in original plane data. 



The distance modification scheme gives good results for the Swiss roll data because it 
was obtained by twisting a linear manifold without stretching. However, if it is necessary to 
stretch the data to "unfold" it into a linear manifold, the scheme will not work unless some 
of the nearest neighbor distances are also modified. This can be seen with the example in 
Figure 14.2( a) which consists of a set of points somewhat uniformly placed on the majority 
of a surface of a sphere. In this example, the surface of the sphere is included from the 
"North pole" to to Southern latitude 67.5°. A point was placed at the North pole and then 
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points were place on each of 15 circles of constant latitude at constant separations of 10.5°. 
The points were equi-spaced on each circle. 40 points were placed on the equator and the 
number on each of the other circles was chosen to make the spacing on each circle as similar 
as possible except that a minimum of 6 points were placed on each circle. The location of 
one of the points was chosen from a uniform random distribution. In Figure [4~27 a) all points 
are plotted as dots except for the points on the southern-most latitude which are circles. 
Clearly this 2D surface can be parameterized easily. The southern-most latitude circle is 
the boundary of the finite, non-linear manifold. 

The results of the standard diffusion map and the modified distance scheme arc shown 
in Figure l4~2l (b) and (c), respectively. The plots shown the circles of constant latitude in 
the eigenvalue coordinates. The southern-most circle is shown as a thicker line. 



(a) 




Fig. 4.2. (a) Original data on most of the surface of a sphere, (b) Diffusion Map applied to this data, 
(c) Modified distance scheme applied to same data. 

As can be seen from the figure, the modified distance scheme has the boundary of the 
manifold in the interior of parameterization it produces (as docs the diffusion map process, 
although not as badly). This arise because the scheme places a great emphasis preserving 
the length of nearby neighbors, and in this case there is a circle of nearby neighbors with a 
perimeter that is much smaller than the perimeter of other circles that are in the interior of 
the 2D manifold. Clearly it is necessary to modify the lengths of some nearby neighbors to 
get a good parameterization. A clue to the need for this can be obtained from the size of 
additional eigenvalues of G and local changes in the components of their eigenvectors. Since 
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the method is equivalent to a PCA decomposition^, other eigenvalues must be large when 
we find that points significantly separated in the original space are close in the parameter 
space. How to systematically modify nearby neighbors distances in this case is a subject of 
ongoing research. 
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1 This is true if the distance matrix corresponds to a feasible configuration in real space. If it doesn't, 
for example because the distance inequality is not satisfied, we will get some complex eigenvalues and 
eigenvectors 
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